-
Notifications
You must be signed in to change notification settings - Fork 1.2k
upload method for stagehand #1037
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Removed the file download function from the upload example, simplifying the process by allowing direct URL uploads. - Integrated dotenv for environment variable management. - Updated the upload method to handle URLs directly, downloading files as needed. - Improved error messages for better debugging and user feedback. - Adjusted logging to provide clearer insights into the upload process.
- Set the environment to "BROWSERBASE" in the Stagehand initialization. - Enhanced debugging by adding console logs for element evaluation in the file input check.
🦋 Changeset detectedLatest commit: a137da5 The changes in this PR will be included in the next version bump. Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Summary
This PR introduces file upload functionality to Stagehand, addressing a significant capability gap in the web automation library. The implementation adds a new upload() method that accepts natural language instructions to locate file inputs and supports multiple file sources including URLs, local file paths, and in-memory buffers.
The core architecture employs a three-tier fallback strategy: first attempting direct file input detection, then triggering file choosers, and finally using heuristic search to find associated upload controls. The method leverages Stagehand's existing observe() functionality for AI-powered element detection, ensuring it works with complex scenarios like hidden inputs, iframes, and shadow DOM elements.
Key additions include new TypeScript types (FileSpec and UploadResult) that provide flexible file specification options and structured feedback about upload operations. The implementation handles URL fetching with automatic MIME type detection, processes files entirely in memory without temporary files, and maintains consistent behavior across both local and Browserbase environments.
The upload method integrates seamlessly with Stagehand's existing history tracking system, logging upload events for observability. An example test file demonstrates the intended usage pattern: providing natural language hints to identify upload elements, followed by standard form interactions.
Confidence score: 3/5
- This PR introduces complex new functionality with multiple execution paths that could fail in various edge cases
- Score reflects concerns about package.json changes that could break build processes and potential type definition issues in FileSpec
- Pay close attention to lib/package.json build configuration changes and types/stagehand.ts FileSpec type definition
4 files reviewed, 4 comments
| export type FileSpec = | ||
| | string | ||
| | { | ||
| /** Absolute path on disk to the file. */ | ||
| path?: string; | ||
| /** Name to use for the file (required if using buffer). */ | ||
| name?: string; | ||
| /** MIME type for the file (required if using buffer). */ | ||
| mimeType?: string; | ||
| /** Raw file bytes (requires name + mimeType). */ | ||
| buffer?: Buffer; | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: FileSpec type allows invalid combinations like { name: 'file.txt' } without buffer or path. Consider using discriminated unions to enforce valid combinations.
| export type FileSpec = | |
| | string | |
| | { | |
| /** Absolute path on disk to the file. */ | |
| path?: string; | |
| /** Name to use for the file (required if using buffer). */ | |
| name?: string; | |
| /** MIME type for the file (required if using buffer). */ | |
| mimeType?: string; | |
| /** Raw file bytes (requires name + mimeType). */ | |
| buffer?: Buffer; | |
| }; | |
| export type FileSpec = | |
| | string | |
| | { | |
| /** Absolute path on disk to the file. */ | |
| path: string; | |
| } | |
| | { | |
| /** Name to use for the file (required if using buffer). */ | |
| name: string; | |
| /** MIME type for the file (required if using buffer). */ | |
| mimeType: string; | |
| /** Raw file bytes (requires name + mimeType). */ | |
| buffer: Buffer; | |
| }; |
lib/index.ts
Outdated
| (el): boolean => { | ||
| const tagName = el.tagName.toLowerCase(); | ||
| const type = (el as HTMLInputElement).type; | ||
| console.log(`DEBUG: Element tagName=${tagName}, type=${type}`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Debug console.log statements should be removed from production code
| console.log(`DEBUG: Element tagName=${tagName}, type=${type}`); | |
| const tagName = el.tagName.toLowerCase(); | |
| const type = (el as HTMLInputElement).type; | |
| return tagName === "input" && type === "file"; |
lib/index.ts
Outdated
| }, | ||
| ) | ||
| .catch((e) => { | ||
| console.log(`DEBUG: evaluate failed:`, e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Debug console.log statements should be removed from production code
| console.log(`DEBUG: evaluate failed:`, e); | |
| .catch((e) => { | |
| return false; | |
| }); |
lib/package.json
Outdated
| "main": "./index.js", | ||
| "module": "./index.js", | ||
| "types": "./index.d.ts", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Path change from '../dist/' to './' may break builds if corresponding build scripts haven't been updated. Verify that 'pnpm run build-js' outputs to lib/ instead of dist/.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Summary
This review covers only the changes made since the last review (commit 318aef5), not the entire PR. The recent changes focus on implementing the feedback from previous reviews regarding package.json path configuration and FileSpec type safety improvements.
The key changes include:
-
Package.json Path Correction: The entry points have been updated from
../dist/back to./paths (main, module, and types fields), which appears to address the build output concerns raised in the previous review about potential build script mismatches. -
FileSpec Type Safety Enhancement: The
FileSpectype has been restructured to use proper discriminated unions as suggested in the previous review. The type now enforces three distinct valid combinations: string URLs, objects with requiredpathproperty, or objects with requiredname,mimeType, andbufferproperties. This prevents invalid partial combinations that could fail at runtime. -
Type Guard Implementation: The upload method implementation now includes proper type guards to safely handle the discriminated union FileSpec type, with consistent logic for extracting file names across all three upload strategies (input, chooser, fallback).
-
Debug Code Cleanup: Console.log statements have been removed from the production code as requested in previous reviews.
These changes address the type safety, build configuration, and code quality concerns identified in previous reviews while maintaining the core upload functionality. The implementation follows TypeScript best practices with discriminated unions and type guards to ensure compile-time safety for the flexible file specification system.
Confidence score: 4/5
- This PR addresses previous review feedback effectively with proper type safety and build configuration fixes
- Score reflects solid implementation of discriminated unions and removal of debug code, but package.json changes need verification
- Pay close attention to build scripts to ensure they output to the correct directory matching the package.json paths
3 files reviewed, no comments
|
@greptileai review again |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Summary
This review covers only the changes made since the last review, not the entire PR. The latest changes focus on completing the upload functionality implementation with the addition of a test file and some final adjustments.
The changes add a new uploads.test.ts file in the deterministic test suite that provides comprehensive testing for the new upload functionality. This test follows the established testing patterns in the codebase, using the same Stagehand initialization approach with StagehandConfig and proper setup/teardown hooks. The test verifies the upload feature by navigating to a test URL, locating a file input element, uploading a logo.png file, and asserting that both the file name and size are correctly displayed.
The test integrates well with the existing deterministic test infrastructure in the evals/deterministic/tests/browserbase/ directory, maintaining consistency with other Browserbase-specific tests. It uses standard Playwright testing patterns with locators and assertions, and follows the same asynchronous testing approach as other tests in the suite. The test validates the core upload functionality by checking that the uploaded file's metadata (name and non-zero size) is correctly processed and displayed, ensuring the upload method works as expected in a real browser environment.
Confidence score: 4/5
- This PR adds well-tested upload functionality with minimal risk of breaking existing features
- Score reflects solid implementation with proper testing coverage and integration with existing patterns
- The upload test file requires attention to ensure it aligns with the final API and file structure
3 files reviewed, no comments
# why solves browserbase#1060 patch regression of playwright arguments being removed from agent execute response # what changed agent.execute now returns playwright arguments in its response # test plan tested locally
…ms to docs (browserbase#1065) # why reflect project id changes in docs # what changed advanced configuration comments # test plan reviewed via mintlify on localhost
# why Easier to use for Custom LLM Clients and keep users up to date with our aisdk file # what changed added export of aisdk to lib/index.ts # test plan build local stagehand, import local AISdkClient, run Azure Stagehand session
…onfigu… (browserbase#1073) …ration settings # why Updated docs to match the new fingerprint params in the Browserbase docs here: https://docs.browserbase.com/guides/stealth-customization#customization-options # what changed Update browser configuration docs to reflect the docs changes. # test plan
# why Updating docs to reflect aisdk can be imported directly # what changed The model page # test plan Reviewed page with mintlify dev locally
# why # what changed # test plan
# why Currently, we do not support stagehand agent within the api # what changed When api is enabled, stagehand agent now routes through the api # test plan Tested locally
# why Currently, using playwright screenshot command is not available when the execution environment is Stagehand. A customer has indicated they would prefer to use Playwright's native screenshot command instead of CDP when using Browserbase as CDP screenshot causes unexpected behavior for their target site. # what changed - added a StagehandScreenshotOptions type with useCDP argument added - extended page type to accept custom stagehand screeenshot options - update screenshot proxy to default useCDP to true if the env is browserbase and use playwright screenshot if false - added eval for screenshot with and without cdp # test plan - tested and confirmed functionality with eval and external example script (not committed)
…rowserbase#1057) # why We want to build a best in class agent in stagehand. Therefore, we need more eval benchmarks. # what changed - Added Web-bench evals dataset - Added a subset of OS World evals - those that can be run in a chrome browser (desktop-based tasks omitted) - added LICENSE noticed to the copied evals tasks - Added ground truth / expected result to some WebVoyager tasks using reference_answer.json from Browser Use public evals repo. Improvements to `pnpm run evals -man` to better describe how to run evals. # test plan Evals should run locally and bb for these new benchmarks.
# why Initial instructions didn't mention uv or pip prerequisites and also didn't mention venv. Fix reduces friction on first timers. # what changed - added link to install uv - added details for initializing venv - adjusted code example respectively # test plan docs change
# why - webpage structure changed, needed to update the xpath in the expected locator
… with LanguageModelV1 + LiteLLM works for python (browserbase#1086) # why 1. aisdk not yet available through npm package 2. customLLM provider only works with LanguageModelV1 3. LiteLLM compatible providers are supported in python # what changed 1. change docs to install stagehand from git repo 2. pin versions that use LanguageModelV1 # test plan local test
# why currently we pass stagehand page to agent, this results in our page management having issues when facing new tabs # what changed the stagehand object is now passed instead of stagehandPage # test plan tested locally
# why Our existing screenshot service is a dummy time-based triggered service. It also does not trigger based on any actions of the agent. # what changed Added img hash diff algo (quick check with MSE, verify with SSIM algo) to see if there was an actual UI change and only store ss in the buffer if that is so. Added ss interceptor which copies each screenshot the agent is taking to a buffer (if different enough from the previous ss) to be later used for evals. - There's also a small refactor of the agent initialization config to enable the screenshot collector service to be attached # test plan Tests pass locally --------- Co-authored-by: Miguel <[email protected]> Co-authored-by: miguel <[email protected]>
# why To help make sense of eval test cases and results # what changed Added metadata to eval runs, cleaned deprecated code # test plan
# why # what changed # test plan
# why anthropic released a new sota computer use model # what changed added claude-sonnet-4-5-20250929 as a model to the list # test plan ran evals
…ase#1103) Why Custom AI SDK tools and MCP integrations weren't working properly with Anthropic CUA - parameters were empty {} and tools weren't tracked. What Changed - Convert Zod schemas to JSON Schema before sending to Anthropic (using zodToJsonSchema) - Track custom tool calls in the actions array - Silence "Unknown tool name" warnings for custom tools Test Plan Tested with examples file. Parameters passed correctly ({"city":"San Francisco"} instead of {}) Custom tools execute and appear in actions array No warnings
# why To improve context # what changed Added current page and url to the system prompt # test plan
# why To inform the user throughout the agent execution process # what changed Added logs to tool calls, and on the stagehand agent handler # test plan - [x] tested locally
PR to make clearer the dependencies for `extract` (for those who haven't used zod or pydantic before) --------- Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
# why
- before this change, when we convert `z.string().url()` to an ID, if it
was inside a `z.array()`, it was not getting converted back into a URL
- this meant that if you defined a schema like this:
```ts
schema: z.object({
records: z.array(z.string().url()),
})
```
you would receive an array like this:
```
{
records: [
'0-302', '0-309',
'0-316', '0-323',
'0-330', '0-337',
'0-344', '0-351',
'0-358', '0-365'
]
}
```
- with this change, you will now receive the actual URLs, ie:
```
{
records: [
'https://www.archives.gov/files/research/jfk/releases/2025/0318/104-10003-10041.pdf',
'https://www.archives.gov/files/research/jfk/releases/2025/0318/104-10004-10143%20(C06932208).pdf',
'https://www.archives.gov/files/research/jfk/releases/2025/0318/104-10004-10143.pdf',
'https://www.archives.gov/files/research/jfk/releases/2025/0318/104-10004-10156.pdf',
'https://www.archives.gov/files/research/jfk/releases/2025/0318/104-10004-10213.pdf',
'https://www.archives.gov/files/research/jfk/releases/2025/0318/104-10005-10321.pdf',
'https://www.archives.gov/files/research/jfk/releases/2025/0318/104-10006-10247.pdf',
'https://www.archives.gov/files/research/jfk/releases/2025/0318/104-10007-10345.pdf',
'https://www.archives.gov/files/research/jfk/releases/2025/0318/104-10009-10021.pdf',
'https://www.archives.gov/files/research/jfk/releases/2025/0318/104-10009-10222.pdf'
]
}
```
# what changed
- updated the `injectUrls` function so that when it hits an array and
there is not deeper path, it loops through the array and injects the
URLs
# test plan
- evals
# why Adding support for Gemini's new Computer Use model # what changed We partnered with Google Deepmind to help integrate and test their new Computer Use models. <img width="1238" height="655" alt="Screenshot 2025-10-07 at 1 14 44 PM" src="https://github.com/user-attachments/assets/af0d854a-8e55-4937-a071-10335497f686" /> The new model tag `gemini-2.5-pro-computer-use-preview-10-2025` is available for Stagehand Agent. You can try it today with the example `cua-example.ts` To learn more, check out the blog post [https://www.browserbase.com/blog/evaluating-browser-agents](https://www.browserbase.com/blog/evaluating-browser-agents) --------- Co-authored-by: tkattkat <[email protected]> Co-authored-by: Kylejeong2 <[email protected]> Co-authored-by: Sameel <[email protected]>
# why # what changed # test plan
This PR was opened by the [Changesets release](https://github.com/changesets/action) GitHub action. When you're ready to do a release, you can merge this and the packages will be published to npm automatically. If you're not ready to do a release yet, that's fine, whenever you add more changesets to main, this PR will be updated. # Releases ## @browserbasehq/[email protected] ### Patch Changes - [browserbase#1082](browserbase#1082) [`8c0fd01`](browserbase@8c0fd01) Thanks [@tkattkat](https://github.com/tkattkat)! - Pass stagehand object to agent instead of stagehand page - [browserbase#1104](browserbase#1104) [`a1ad06c`](browserbase@a1ad06c) Thanks [@miguelg719](https://github.com/miguelg719)! - Fix logging for stagehand agent - [browserbase#1066](browserbase#1066) [`9daa584`](browserbase@9daa584) Thanks [@tkattkat](https://github.com/tkattkat)! - Add playwright arguments to agent execute response - [browserbase#1077](browserbase#1077) [`7f38b3a`](browserbase@7f38b3a) Thanks [@tkattkat](https://github.com/tkattkat)! - adds support for stagehand agent in the api - [browserbase#1032](browserbase#1032) [`bf2d0e7`](browserbase@bf2d0e7) Thanks [@miguelg719](https://github.com/miguelg719)! - Fix for zod peer dependency support - [browserbase#1014](browserbase#1014) [`6966201`](browserbase@6966201) Thanks [@tkattkat](https://github.com/tkattkat)! - Replace operator handler with base of new agent - [browserbase#1089](browserbase#1089) [`536f366`](browserbase@536f366) Thanks [@miguelg719](https://github.com/miguelg719)! - Fixed info logs on api session create - [browserbase#1103](browserbase#1103) [`889cb6c`](browserbase@889cb6c) Thanks [@tkattkat](https://github.com/tkattkat)! - patch custom tool support in anthropic cua client - [browserbase#1056](browserbase#1056) [`6a002b2`](browserbase@6a002b2) Thanks [@chrisreadsf](https://github.com/chrisreadsf)! - remove need for duplicate project id if already passed to Stagehand - [browserbase#1090](browserbase#1090) [`8ff5c5a`](browserbase@8ff5c5a) Thanks [@miguelg719](https://github.com/miguelg719)! - Improve failed act error logs - [browserbase#1014](browserbase#1014) [`6966201`](browserbase@6966201) Thanks [@tkattkat](https://github.com/tkattkat)! - replace operator agent with scaffold for new stagehand agent - [browserbase#1107](browserbase#1107) [`3ccf335`](browserbase@3ccf335) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - fix: url extraction not working inside an array - [browserbase#1102](browserbase#1102) [`a99aa48`](browserbase@a99aa48) Thanks [@miguelg719](https://github.com/miguelg719)! - Add current page and date context to agent - [browserbase#1110](browserbase#1110) [`dda52f1`](browserbase@dda52f1) Thanks [@miguelg719](https://github.com/miguelg719)! - Add support for new Gemini Computer Use models ## @browserbasehq/[email protected] ### Minor Changes - [browserbase#1057](browserbase#1057) [`b7be89e`](browserbase@b7be89e) Thanks [@filip-michalsky](https://github.com/filip-michalsky)! - added web voyager ground truth (optional), added web bench, and subset of OSWorld evals which run on a browser ### Patch Changes - [browserbase#1072](browserbase#1072) [`dc2d420`](browserbase@dc2d420) Thanks [@filip-michalsky](https://github.com/filip-michalsky)! - improve evals screenshot service - add img hashing diff to add screenshots and change to screenshot intercepts from the agent - Updated dependencies \[[`8c0fd01`](browserbase@8c0fd01), [`a1ad06c`](browserbase@a1ad06c), [`9daa584`](browserbase@9daa584), [`7f38b3a`](browserbase@7f38b3a), [`bf2d0e7`](browserbase@bf2d0e7), [`6966201`](browserbase@6966201), [`536f366`](browserbase@536f366), [`889cb6c`](browserbase@889cb6c), [`6a002b2`](browserbase@6a002b2), [`8ff5c5a`](browserbase@8ff5c5a), [`6966201`](browserbase@6966201), [`3ccf335`](browserbase@3ccf335), [`a99aa48`](browserbase@a99aa48), [`dda52f1`](browserbase@dda52f1)]: - @browserbasehq/[email protected] ## @browserbasehq/[email protected] ### Patch Changes - Updated dependencies \[[`8c0fd01`](browserbase@8c0fd01), [`a1ad06c`](browserbase@a1ad06c), [`9daa584`](browserbase@9daa584), [`7f38b3a`](browserbase@7f38b3a), [`bf2d0e7`](browserbase@bf2d0e7), [`6966201`](browserbase@6966201), [`536f366`](browserbase@536f366), [`889cb6c`](browserbase@889cb6c), [`6a002b2`](browserbase@6a002b2), [`8ff5c5a`](browserbase@8ff5c5a), [`6966201`](browserbase@6966201), [`3ccf335`](browserbase@3ccf335), [`a99aa48`](browserbase@a99aa48), [`dda52f1`](browserbase@dda52f1)]: - @browserbasehq/[email protected] Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
# why The original example used JavaScript destructuring syntax [table] which doesn't work in Python. Fixed to use proper Python array indexing. # what changed fixed example to proper python syntax # test plan Co-authored-by: Steven Bryan <[email protected]>
# why - need to set default viewport when running on browserbase. previously, we only defined the default inside the exported `StagehandConfig` # what changed - set default viewport to 1288 * 711 when running on browserbase # test plan - tested locally, - regression evals
This PR was opened by the [Changesets release](https://github.com/changesets/action) GitHub action. When you're ready to do a release, you can merge this and the packages will be published to npm automatically. If you're not ready to do a release yet, that's fine, whenever you add more changesets to main, this PR will be updated. # Releases ## @browserbasehq/[email protected] ### Patch Changes - [browserbase#1114](browserbase#1114) [`c0fbc51`](browserbase@c0fbc51) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - configure default viewport when running on browserbase ## @browserbasehq/[email protected] ### Patch Changes - Updated dependencies \[[`c0fbc51`](browserbase@c0fbc51)]: - @browserbasehq/[email protected] ## @browserbasehq/[email protected] ### Patch Changes - Updated dependencies \[[`c0fbc51`](browserbase@c0fbc51)]: - @browserbasehq/[email protected] Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Updated link in the Getting Started section to point to the correct Quickstart Guide. # why Quickstart link in README leads to a non-existent page. <img width="1556" height="763" alt="image" src="https://github.com/user-attachments/assets/20a1a5b5-8534-43b4-89d5-e3a062b3965a" /> # what changed Updated quickstart link in README to the correct quickstart address `https://docs.stagehand.dev/first-steps/quickstart` # test plan Access new link to quickstart
stagehand upload method
why
stagehand could click, type, scrape… but uploads were messy.
no way to handle hidden inputs, chooser popups, urls, buffers.
no natural language → file input.
no clean support across local + browserbase.
so i built it.
what changed
1. new types
2. new api
observe)UploadResult3. file processing
4. integration
test plan
✅ done
🧪 next
notes
usage
impact
stagehand now does uploads properly.
handles messy real-world forms.
natural language → correct element.
works across environments.
less brittle, more useful.
opens up workflows like automated form filling + doc processing.
basically: it can now upload files.
p.s. also my first oss contribution